Classification Algorithms for Big Data Analysis, a Map Reduce Approach

نویسندگان

  • V. A. Ayma
  • R. S. Ferreira
چکیده

Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA’s machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance. * Corresponding author

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed.  A categorized dictiona...

متن کامل

Market Basket Analysis Algorithm on Map/Reduce in AWS EC2

As the web, social networking, and smartphone application have been popular, the data has grown drastically everyday. Thus, such data is called Big Data. Google met Big Data earlier than others and recognized the importance of the storage and computation of Big Data. Thus, Google implemented its parallel computing platform with Map/Reduce approach on Google Distributed File Systems (GFS) in ord...

متن کامل

Micro-classification of orchards and agricultural croplands by applying object based image analysis and fuzzy algorithms for estimating the area under cultivation

Remote sensing technology is one of the most efficient and innovative technologies for agricultural land use/cover mapping. In this regard, the object-based Image Analysis (OBIA) is known as a new method of satellite image processing which integrates spatial and spectral information for satellite image process. This approach make use of spectral, environmental, physical and geometrical characte...

متن کامل

Classification in Data Mining and Analysis of Fuzzy Based Methods –big Data Analytics

Data mining comprises various applications that specifically includes biological and biomedicine data which continues to be research oriented task in the bioinformatics field. In today’s world big data applications are mainly focused because of enormous increase in data and also its storage. This leads to major problem when extracting knowledge or information by processing huge amount of data. ...

متن کامل

Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing

Map/Reduce approach has been popular in order to compute huge volumes of data since Google implemented its platform on Google Distributed File Systems (GFS) and then Amazon Web Service (AWS) provides its services with Apache Hadoop platform. Map/Reduce motivates to redesign and convert the existing sequential algorithms to Map/Reduce algorithms for big data so that the paper presents Market Bas...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015